feat(env): integrate CodeDebug environment and vLLM inference stabilization by RUFFY-369 · Pull Request #3448 · NousResearch/hermes-agent

RUFFY-369 · 2026-03-27T20:30:53Z

Note

Research Context: This PR integrates the physical environment provided in atropos (PR #421) and establishes the foundation for the MT-GRPO reward infrastructure in PR #3451.

What does this PR do?

This PR integrates the CodeDebug environment for agentic reasoning and introduces a Universal Tool Strategy to stabilize inference on vLLM backends.

It solves two primary problems:

Lack of Agentic Debugging Benchmarks
Provides a high-fidelity sandbox where agents iteratively debug code using real terminal and file tools.
vLLM Tool Parsing Bugs
Implements a robust client-side tool parsing fallback in model_tools.py and environments/agent_loop.py to bypass known bugs in vLLM 0.6.5 that cause 400/500 errors during complex tool-use sessions.

Related Issue

Fixes # (Initial integration of debugging environment)

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

environments/agent_loop.py: Added client-side tool parsing loop and atropos_inhibit_tools support
model_tools.py: Added parse_tool_calls_from_text (regex-based fallback parser)
environments/code_debug_env/code_debug_env.py: Environment logic based on HumanEvalFix
environments/code_debug_env/default.yaml: Configuration for agent prompts and tool inhibition
environments/code_debug_env/README.md: Usage and stabilization documentation

How to Test

Rollout Verification

python environments/code_debug_env/code_debug_env.py process

vLLM Stability
- Verify that complex tool calls are correctly extracted from <tool_code> tags
- Confirm fallback parsing works when server-side parsing fails
Mock Trajectories
- Verify that tool calls (terminal/file) are correctly dispatched to the sandbox
- Ensure multi-step debugging runs complete without errors

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass (Note: system-level version conflict in env, but rollout verified)
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: Ubuntu 22.04

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings)
I've updated cli-config.yaml.example — or N/A
I've updated CONTRIBUTING.md or AGENTS.md — or N/A
I've considered cross-platform impact (Windows, macOS) — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

N/A — This is an RL training environment integration.

Screenshots / Logs

Environment successfully executes multi-turn debugging sessions
Real terminal feedback verified
Running on Port 9001

cc @teknium1

Extends HermesAgentBaseEnv with: - HumanEvalPack dataset (164 buggy Python functions) - Workspace scaffolding (buggy.py + tests.py uploaded to sandbox) - Multi-signal reward: test_signal (0.5), diagnosis (0.3), efficiency (0.2) - Terminal + file toolsets for iterative debugging

…-ASCII characters

RUFFY-369 and others added 19 commits March 24, 2026 18:02

Merge branch 'NousResearch:main' into feat/code-debug-agent-env

eaf850e

chore:switch CodeDebugEnv to vllm server type for stable tool-calling

41e1f71

chore:use manual tool injection for CodeDebugEnv stability on vLLM

d3f959d

chore:explicitly set tool choice

f7a2ec6

fix: inhibit tools flag:

9232c50

fix:add extra body flag on env config too

a4fe80c

chore:add prints to see what model is responding

460ade6

feat:implement tool parser

0f7e0a1

fix:make parser more robust

ebb16d9

fix:import error

fb4449e

fix:env import issue

d24b85b

fix:final name error in logging

d05ab7a

docs: add README for CodeDebugEnv

60502cf

Merge branch 'NousResearch:main' into feat/code-debug-agent-env

77d935e

refactor:final production-ready audit; remove debug artifacts and non…

e1c4fe0

…-ASCII characters

Merge branch 'NousResearch:main' into feat/code-debug-agent-env

19bcf94

style: apply black and ruff formatting for production standards

5416d9d

Merge branch 'NousResearch:main' into feat/code-debug-agent-env

4ff57ae

This was referenced Mar 27, 2026

feat(rl): implement TurnLevelReward infrastructure for Multi-Turn GRPO #3451

Open

feat: integrate CodeDebug (HumanEvalFix) environment and core server stabilization NousResearch/atropos#421

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(env): integrate CodeDebug environment and vLLM inference stabilization#3448

feat(env): integrate CodeDebug environment and vLLM inference stabilization#3448
RUFFY-369 wants to merge 19 commits intoNousResearch:mainfrom
RUFFY-369:feat/code-debug-agent-env

RUFFY-369 commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RUFFY-369 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

For New Skills

Screenshots / Logs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RUFFY-369 commented Mar 27, 2026 •

edited

Loading